The first class, as represented in TransProR, includes methods such as DESeq2, edgeR, limma, and the Wilcoxon rank-sum test, which focus on univariate statistical tests. These techniques evaluate the independent correlation between individual features, like gene expression levels, and outcome variables such as disease status. Defined by their unique statistical models and assumptions about data distribution and variability, these methods do not incorporate direct considerations for interactions between features. Their forte lies in identifying features that statistically stand out from control conditions.
Conversely, the second class, encapsulated within TransProPy, employs more sophisticated methodologies. This includes the mvAUC metric, MACFC algorithm, and the integration of advanced machine learning techniques seen in AutoGluon and the auto_feature_selection function. This suite of approaches goes beyond assessing the utility of solitary features, instead appraising their collective interactions and complementarity. They seek to discern sets of features that act in concert to influence the outcome variable.
The mvAUC metric evaluates the global complementarity among features, ascertaining the enhancement of classification capability when features are combined.
The method called New_MACFCmain, stemming from TransProPy, employs the MACFC algorithm, which in turn utilizes mvAUC to measure feature redundancy accurately. It captures both novel class-relevant information and the degree of redundancy across variables. This facilitates the algorithm in efficiently identifying complementary features and selecting effective combinations thereof.
Incorporating techniques like ensemble learning and recursive feature elimination, AutoGluon and TransProPy’s auto_feature_selection approach accounts for the comprehensive performance of features within a predictive model framework.